home *** CD-ROM | disk | FTP | other *** search
- .KF:chap4.toc
- .KW:59
- .N:93
- .XT:2
- .XB:0
- .X:10
- .L:59
- .M:1
- L---+----1----+----2----+----3----+----4----+----5T---+---R6----+----7----+----8
- .H:
- .H:
- .H:
- .F:
- .F:...$$$...
- .M:1
-
-
-
-
- CHAPTER 4
-
-
- MACHINE TRANSLATION OF MATTHEW 26:1-35
- .K:4. MACHINE TRANSLATION OF MATTHEW 26:1-35
-
-
-
- .M:2
- This chapter will discuss the implementation and
- theoretical basis of the machine translation program
- developed in conjunction with this thesis. The program
- accepts as its source a derivative of the text found in the
- Semantic Structure Analysis (SSA) displays of the previous
- chapter. This choice of source text is explained in the
- following section. A sample of the text is included in
- Appendix C. The program also references a specialized
- lexicon referred to in this thesis as a semanticon. A
- portion of the semanticon is included in Appendix D. A
- sample of the program's translated output (in Spanish) is
- included in Appendix E. A diskette containing the trans
- lation program and all the files necessary to run it is
- bound into the back of this thesis. The complete trans
- lation of Matthew 26:1-35 is included on the diskette. The
- contents of the diskette are outlined in Appendix F. A
- listing of the program, which is written in the ICON
- programming language, is contained in Appendix G.
- .H:
- .H: $$$
- .H:
- .F:
- .F:
-
-
- 94
- Theoretical Basis Of The Implementation
- .K: Theoretical Basis Of The Implementation
-
- A fundamental principle underlying the design of the
- machine translation program is the notion that it is
- reasonable to put a good deal of manual analysis into a
- text that will be translated into a multitude of target
- languages. An example of such a text is the Bible, which
- still has not been translated into some 3500 minority
- languages. Some other suitable candidates for this type of
- treatment are the legislation of the European Community,
- and owner's manuals for various products. A corollary to
- this first principle is the notion that any machine trans
- lation program will be more successful if the grammar of
- the source text is as limited as possible. In keeping with
- this corollary the syntax of the program's input text has
- been greatly simplified as set forth in the previous
- chapter about Semantic Structure Analysis.
-
- A second fundamental principle is that the program
- attempts to translate meaning rather than just words. This
- is because word based machine translations often produce
- wrong meaning due to ambiguities in the source text.
- Another problem with word based translation programs is
- that they become large, complex, and slow because they must
- employ various techniques to try to minimize the errors
- which spring from ambiguities in the source text. One of
- 95
- the greatest problems with word-based translations is that
- they assume that surface structures between languages are
- identical. This ignores the fact that every language has
- its own devices for skewing the basic relations between
- concepts and propositions in producing surface structures,
- and the rules for such skewing are very context-sensitive.
- For these reasons, the program presented in this thesis
- attempts to translate meaning, and to that end, the
- analysis of the source text is based on the theory ex
- pounded in The Semantic Structure of Written Communication
- (SSWC) by Beekman, Callow, & Kopesec (1981).
-
- According to the SSWC, concepts/meanings come in
- four classes: things, events, attributes, and
- relations (1981:49). In their simplest forms things are
- represented by nouns, events by verbs, attributes by
- adjectives and adverbs, and relations by function words
- like conjunctions, sentence adverbs, and prepositions.
-
- A formidable problem for the translator presents
- itself when concepts are not represented in their simplest
- forms; this is called lexical skewing. For instance, in
- the sentence, 'John gave Mary some help' the word 'help' is
- really an event. A simpler (i.e. unskewed) way to express
- the same meaning would be, 'John helped Mary.'
-
- 96
- A linguistic universal could be claimed here. That
- is, all languages allow unskewed forms of expression, but
- no language allows all possible skewed forms of a concept.
- While it is beyond the scope of this thesis to attempt to
- prove the validity of this linguistic universal, there is
- ample anecdotal evidence to support it. For instance, in
- Spanish it is impossible to use the word for 'grape' as an
- adjective. So in Spanish one would never talk about 'grape
- wine', but one could express the concept in unskewed form
- as vino de uvas 'wine from grapes'.
-
- Another assumption underlying the implementation of
- this program is that the analysis of the source text will
- be done primarily by native speakers of the source
- language. Likewise, post-editing of the translated text
- will be performed primarily by native speakers of the
- target language. The role of any bilingual person involved
- in the translation process could be limited to that of
- consultant and translation checker. This approach has the
- obvious benefit of reducing the need for scarce, expensive
- bilingual translation specialists.
-
- The text that was translated as a part of this
- thesis represents something of a special case in that the
- analysis of the original text was, for obvious reasons, not
- done by native speakers of Koine Greek. Nevertheless, it
- 97
- could certainly be argued that the process of analyzing the
- original text would have been greatly simplified if such
- speakers of Koine Greek were still available. It should
- also be pointed out that the translation program does not
- accept the original text as its source text, but rather an
- English source text which is derived from the semantic
- structure analysis of the original Greek text. The current
- lack of native speakers of Koine Greek is precisely what
- motivates the use of an English rather than a Greek source
- text as input to the program.
-
- Finally it is assumed that in its first draft a
- translation does not need to be perfect to be understand
- able. This is born out by the experience of anyone who has
- found it necessary to communicate with a non-native speaker
- of his or her own language. Even though this speaker may
- have less than a perfect control of the language, communi
- cation is often successful. Native speakers of a language
- seem to have a high degree of tolerance for imperfect
- grammar. The advantage of taking this position is that
- where imperfections in the grammar of the translated text
- are considered minor, they can simply be left to the post-
- editor to correct.
-
-
- 98
- Implementation Details
- .K: Implementation Details
-
- In the analysis of the English source text included
- with the program, an attempt was made to eliminate lexical
- skewing to the fullest extent possible. It should be noted
- that this is not entirely necessary when translating
- between closely related languages, but it becomes critical
- when translating into minority languages which may lack
- abstract nouns for events like 'love' or 'forgiveness'.
-
- As noted above, an attempt was also made to utilize
- a very limited syntax in the analysis of the source text.
- Ideally each sentence of the source text should consist of
- a subject, verb, objects, and possibly a relative clause.
- Passive voice was not permitted because it does not exist
- in all languages, nor does it always serve the same
- function.
-
- In an attempt to represent all concepts using words
- employed in their primary senses, figures of speech such
- as metaphors, idioms, euphemisms, and so on were spelled
- out. In many languages these would cause much confusion if
- translated literally. (In fact, figures of speech are
- simply a variation on the theme of lexical skewing.)
- Finally, conjunctions and sentence adverbs were used in a
- stylized manner (i.e. they always mean the same thing).
-
- 99
- To facilitate translation of meanings rather than
- words, a system utilizing connecting underscores and
- subscripting digits was employed in the preparation of the
- source text. For instance, 'chief_priests1' is treated as
- a single concept, and thus contains a connecting under
- score. Such underscores represent the native speaker's
- judgement of how the source language words should be
- grouped into concepts. The subscripting digit '1' is
- added to distinguish this concept from any others which
- might possibly be renderable by the same English words.
- The subscripting digits used are somewhat arbitrary, but in
- the case of verbs the digits 1 through 3 were used for
- first, second, and third person singular verbs, and the
- digits 4 through 6 were used for the plural forms. Thus
- 'know6' would mean 'they know'.
- .H:
- .H: $$$
- .H:
- .F:
- .F:
-
- Forms such as 'chief_priests1' and 'know6' are
- considered to be arbitrary symbols for units of meaning.
- They could just as easily have been rendered as 'abc1' and
- 'xyz6', but this would have resulted in an input text that
- was unreadable. Nevertheless, the idea that these symbols
- are arbitrary is important. For example, 'chief_priests1'
- may be rendered fairly literally in one language (i.e.
- sacerdotes principales in Spanish), but in another language
- the translation might sound more like 'honored old men who
- 100
- perform ceremonial rites'. The arbitrary forms used to
- represent meanings are called semantic tags in the program.
-
- Since the program is attempting to translate
- meanings rather than words, it uses an invention called a
- semanticon (see Appendix D) rather than a lexicon. Here is
- what an entry in the semanticon looks like:
-
- .M:1
- |---- Morphological Tag
- |
- | |----- Target
- | | Language
- Semantic Tag -----| | | Sense
- | | |
- 'feast1' 'n' 'la fiesta'
- .M:2
-
- Each entry in the semanticon begins with a semantic tag as
- described above. The next field in each entry is a morpho
- logical tag. A morphological tag is basically a part of
- speech, but it can contain additional information such as
- person, number, gender, tense, and so on. The morphologi
- cal tag refers to the target language rendering of the
- concept represented by the semantic tag. This target
- language rendering may not strictly match the semantic
- tag in the traditional sense. For instance, sacerdotes
- principales 'priests principal' is not a noun in the
- traditional sense, but a combination of a noun plus an
- adjective. However, it functions as a single unit, and for
- this reason the conglomerate is treated as a noun in the
- 101
- semanticon. The next field in the semanticon entry is the
- target language rendering of the concept represented by the
- semantic tag. It generally contains a single target
- language word, but it may contain multiple words connected
- by underscores. If the morphological tag is 'n' for noun,
- the entry for the target language rendering consists of
- an article followed by one or more words connected by
- underscores which loosely represent a noun. If, in
- Spanish, the morphological tag is one of those for adjec
- tives, the entry consists of four words: a masculine and a
- feminine singular adjective and a masculine and a feminine
- plural adjective.
-
- The source language text to be translated (see
- Appendix C) contains braces. These braces are used to
- delimit portions of the text which should be translated as
- a unit. For instance, noun phrases and prepositional
- phrases are surrounded by braces, and the main clause is
- surrounded by braces unless it is the only clause in its
- source line. The program translates text surrounded by
- braces as units. For example, if a noun phrase is sur
- rounded by braces, the program will never make the article
- of that noun phrase agree with a noun which is outside that
- noun phrase.
-
-
- 102
- Program Operation
- .K: Program Operation
-
- The program first opens all of its files, and then
- reads the entire semanticon into memory. (Some experienced
- programmers may cringe at the thought of reading the entire
- semanticon into memory, but memory has become a very
- inexpensive commodity, and its copious use greatly accel
- erates program execution.) Next, a sentence of untrans
- lated source text is read into memory, and the sentence is
- placed into an ICON list structure. Each element of this
- list structure represents one concept (i.e. word or words)
- from the source sentence. (A description of list struc
- tures is outside the scope of this thesis, but use of this
- structure greatly reduces the programming burden that would
- result if sentences were represented as strings.) Next,
- each concept is referenced in the semanticon, and the
- information obtained from the semanticon is added to the
- list.
-
- At this point the structure which the program has
- created is analogous to a sentence in the source language
- with target language glosses beneath each word. The
- program next segments the text based on the position of
- braces within the text. When a segment of text is located
- which contains no further sub-segments (delimited by
- braces) that segment is translated. Translation involves a
- 103
- number of processes including adjustments to word order,
- word agreement, capitalization, punctuation, and phonology.
- When all the segments of a line of text have been trans
- lated, they are assembled into a string, and written to the
- output file. This process is repeated until all the input
- text has been translated.
-
-
- Critique
- .K: Critique
-
- From the discussion above it can be discerned that
- the program translates one sentence at a time. Thus it
- might seem that all discourse considerations (i.e. rela
- tionships between units larger than a sentence) have been
- ignored. However, this is not the case. It is true that
- because of the great similarity between the languages and
- cultures of English and Spanish speakers, the differences
- in discourse structure between the two languages is
- minimal. Nevertheless, it can be argued that discourse
- considerations have not been completely ignored because the
- analysis performed on the text prior to translation
- produced sentence adverbs and conjunctions that are used in
- a stylized (i.e. consistent) manner. Thus the relationship
- of any clause introduced by one of these sentence adverbs
- or conjunctions to the preceding discourse should come
- through clearly in the translation.
-
- 104
- On the other hand, there will be problems using this
- approach with languages which employ an oral style of story
- telling in which certain information is repeated several
- times. I am inclined to solve this problem by making
- adaptations to the source text rather than to the program
- because the majority of the world's languages will not
- require this accommodation, and the ones which do will
- undoubtedly differ in their requirements.
-
- Another discourse consideration which deserves
- attention is that of pronominal reference. An example of
- the problem is, 'The disciples prepared the passover meal.
- Later they ate it.' When rendering the second sentence
- into Spanish the translation of it would need to be
- feminine 'la' to make it agree in gender with the trans
- lation of the word for meal 'comida'. However, in another
- language the word for meal might be masculine or neuter in
- gender.
-
- In the current version of the translation program
- this issue has been deliberately ignored, but only because
- the current version of the program is intended primarily to
- prove the feasibility of translating fixed texts into
- multiple languages by means of a computer program. Dealing
- with the problem of participant reference increases the
- size and complexity of not only the program but the source
- 105
- text as well. This would make the program harder to
- understand, and the source text harder to read. Pronominal
- reference will be dealt with in the next version of the
- program.
-
- The next version of the program will also employ
- markers in the source text for semantic roles like agent
- and patient. This will make it possible to translate into
- languages that are ergative-absolutive. Such languages use
- coding schemes which are entirely different from English.
- For instance, in English the agent of any active sentence
- is normally coded, at the surface level, in the nominative
- (i.e. subject) case. However, in an ergative-absolutive
- language the agent may be realized in the ergative case at
- the surface level if it is the subject of a transitive
- verb, but it may be realized in the absolutive case if it
- is the subject of an intransitive verb (one which doesn't
- take an object).
-
-
- Implementing A New Target Language
- .K: Implementing A New Target Language
-
- To make the program translate into some other
- language such as French, it would first be necessary to
- change the semanticon to contain French renderings for the
- semantic tags. (The semanticon can be changed with a text
- 106
- editor.) Note that French requires explicit subject
- pronouns. For instance, the entry for 'know6' would need
- to contain two words meaning 'they know' rather than the
- single Spanish word saben.
-
- Also, for some languages (not necessarily for
- French) it may be necessary that some concepts be expressed
- more specifically than is required in English. For
- instance, it may not be possible to simply talk about a
- 'brother'. It may be necessary to specify 'older brother'
- or 'younger brother'. In such situations it will be
- necessary to edit the source text to include semantic tags
- ('brother1' and 'brother2') which specify the more specific
- concepts. Fortunately, this does not render the enhanced
- source text unusable for languages which do not require
- this additional information. In such cases semantic tags
- like 'brother1' and 'brother2' can simply be translated
- into the target language equivalent of 'brother'.
-
- After this is done, it would still be necessary to make
- some program modifications, but they should not be too
- formidable for a closely related language like French.
- First of all, the program has some global variables
- containing Spanish articles. These would need to be
- changed to contain their French counterparts, but it would
- probably not be necessary to change the identifier names of
- 107
- these global variables. Second, it would be necessary to
- modify the procedure contract(), because the rules for
- contraction are different in French. Likewise, the
- procedure phono_adj() which makes phonological adjustments
- (like 'a house' but 'an hour') would have to be modified to
- follow French rules. Finally, the procedures which correct
- word order (order() and the procedures it calls) would also
- need to be modified to accommodate French word order. None
- of the required modifications should be very time consuming
- since the entire program was written for Spanish in just
- fifteen days.
-
-
- Conclusion
- .K: Conclusion
-
- I have attempted to show some of the theoretical basis
- for producing machine translations, and to demonstrate the
- feasibility of translating fixed texts into multiple target
- languages using a computer program as a translation aid. I
- have demonstrated, via the translated text in Appendix E,
- that such fixed texts can be translated with a high degree
- of quality if the source text is adequately pre-analyzed.
- I have also asserted that the pre-analysis can be performed
- by persons who are native speakers of only the source
- language (i.e. English), and who may have no knowledge of
- any of the intended target languages. Likewise, I
- 108
- have contended that any required post-editing can be done
- by persons who are fluent in only the target language, and
- the role of any bilingual specialists could be limited to
- that of consultants and translation checkers. Considering
- all of these points, it should be possible to produce
- translations of fixed texts into multiple languages using a
- machine translation program as a translation aid and to do
- so more quickly, more consistently, and at a lower cost
- than by traditional methods.